Skip to content

fix: Update server capacity calculation#6163

Merged
AyshaHakeem merged 11 commits intofrappe:developfrom
AyshaHakeem:fix-site-or-bench-placement
Apr 24, 2026
Merged

fix: Update server capacity calculation#6163
AyshaHakeem merged 11 commits intofrappe:developfrom
AyshaHakeem:fix-site-or-bench-placement

Conversation

@AyshaHakeem
Copy link
Copy Markdown
Collaborator

@AyshaHakeem AyshaHakeem commented Apr 17, 2026

Update logic for use_for_new_benches and use_for_new_sites server flags are refreshed

  • Consider active, public primary servers for each cluster
  • Fetch available memory and available vCPU for all servers in bulk from Prometheus
  • Mark the server with most memory for new benches and the one with most vCPU for new sites
    • If the server shortlisted for deploying release groups doesn't have sufficient memory, create an incident record

Misc:

  • Update related tests
  • Add validation in the server class to ensure one public server exists in the cluster for site and bench creation:
    • These flags are maintained by refresh_new_bench_and_site_server_pool background job.
    • This validation prevents site/bench creation failures for newly created clusters before the job runs.

@AyshaHakeem AyshaHakeem requested a review from adityahase as a code owner April 17, 2026 07:29
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 17, 2026

Codecov Report

❌ Patch coverage is 44.95413% with 60 lines in your changes missing coverage. Please review.
✅ Project coverage is 49.06%. Comparing base (3fc42f0) to head (891673f).
⚠️ Report is 130 commits behind head on develop.

Files with missing lines Patch % Lines
press/press/doctype/server/server.py 34.48% 57 Missing ⚠️
...ess/doctype/site_group_deploy/site_group_deploy.py 0.00% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #6163      +/-   ##
===========================================
- Coverage    55.78%   49.06%   -6.73%     
===========================================
  Files          931      910      -21     
  Lines        76751    75838     -913     
  Branches       525      353     -172     
===========================================
- Hits         42814    37208    -5606     
- Misses       33909    38606    +4697     
+ Partials        28       24       -4     
Flag Coverage Δ
dashboard 60.73% <ø> (-30.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@AyshaHakeem AyshaHakeem force-pushed the fix-site-or-bench-placement branch from a281b5e to 7082113 Compare April 19, 2026 11:54
@AyshaHakeem AyshaHakeem requested review from Copilot and removed request for adityahase April 19, 2026 11:54
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the heuristic used by Press to pick the least-loaded public primary server per cluster for use_for_new_sites / use_for_new_benches, normalizing site CPU load by server vCPU capacity.

Changes:

  • Updates the scoring definition to use (sum(site_plan.cpu_time_per_day for non-archived/suspended sites)) / server_vcpu.
  • Adds a QueryBuilder query to fetch server vCPU capacity (via Server Plan) with a fallback to 1.
  • Ensures every server in server_names receives a score entry (including servers with no eligible sites).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread press/press/doctype/server/server.py Outdated
Comment thread press/press/doctype/server/server.py Outdated
Comment thread press/press/doctype/server/server.py Outdated
@AyshaHakeem AyshaHakeem force-pushed the fix-site-or-bench-placement branch from 7082113 to 27bd19c Compare April 20, 2026 05:53
@AyshaHakeem AyshaHakeem requested a review from phot0n April 23, 2026 05:44
@AyshaHakeem AyshaHakeem force-pushed the fix-site-or-bench-placement branch 2 times, most recently from 5eb41b0 to e98a603 Compare April 23, 2026 07:07
- Update heurisitic to set use_for_new_site & and use_for_new_benches flags in server record:
  - Weight sum of sum of site plan cpu_time_per_day across all sites against server vcpu capacity
- Consider active, public primary servers for each cluster
- Fetch available memory and available vCPU for all servers in bulk from Prometheus
- Mark the server with most memory for new benches and the one with most vCPU for new sites
…nch creation:

- These flags are maintained by refresh_new_bench_and_site_server_pool background job.
- This validation prevents site/bench creation failures for newly created clusters before the job runs.
- Prefer use_for_new_sites and use_for_new_benches via ordering instead of filtering to avoid selection failures
@AyshaHakeem AyshaHakeem force-pushed the fix-site-or-bench-placement branch from a11590d to 891673f Compare April 24, 2026 07:52
@AyshaHakeem AyshaHakeem merged commit c23239c into frappe:develop Apr 24, 2026
13 of 15 checks passed
@frappe-pr-bot
Copy link
Copy Markdown
Collaborator

🎉 This PR is included in version 0.20.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants